53 research outputs found

    Understanding BatchNorm in Ternary Training

    Get PDF
    Neural networks are comprised of two components, weights andactivation function. Ternary weight neural networks (TNNs) achievea good performance and offer up to 16x compression ratio. TNNsare difficult to train without BatchNorm and there has been no studyto clarify the role of BatchNorm in a ternary network. Benefitingfrom a study in binary networks, we show how BatchNorm helps inresolving the exploding gradients issue

    Binary Quantizer

    Get PDF
    One-bit quantization is a general tool to execute a complex model,such as deep neural networks, on a device with limited resources,such as cell phones. Naively compressing weights into one bityields an extensive accuracy loss. One-bit models, therefore, re-quire careful re-training. Here we introduce a class functions de-vised to be used as a regularizer for re-training one-bit models. Us-ing a regularization function, specifically devised for binary quanti-zation, avoids heuristic touch of the optimization scheme and savesconsiderable coding effort

    Deep Learning Inference Frameworks for ARM CPU

    Get PDF
    The deep learning community focuses on training networks for a better accuracy on GPU servers. However, bringing this technology to consumer products requires inference adaptation of suchInstruction networks for low-energy, small-memory, and computationally constrained edge devices. ARM CPU is one of the important components of edge devices, but a clear comparison between the existinginference frameworks is missing. We provide minimal preliminaries about ARM CPU architecture and briefly mention the difference between the existing inference frameworks to evaluate them based on performance versus usability trade-offs

    Fast high-dimensional Bayesian classification and clustering

    Get PDF
    We introduce a fast approach to classification and clustering applicable to high-dimensional continuous data, based on Bayesian mixture models for which explicit computations are available. This permits us to treat classification and clustering in a single framework, and allows calculation of unobserved class probability. The new classifier is robust to adding noise variables as a drawback of the built-in spike-and-slab structure of the proposed Bayesian model. The usefulness of classification using our method is shown on metabololomic example, and on the Iris data with and without noise variables. Agglomerative hierarchical clustering is used to construct a dendrogram based on the posterior probabilities of particular partitions, to provide a dendrogram with a probabilistic interpretation. An extension to variable selection is proposed which summarises the importance of variables for classification or clustering and has probabilistic interpretation. Having a simple model provides estimation of the model parameters using maximum likelihood and therefore yields a fully automatic algorithm. The new clustering method is applied to metabolomic, microarray, and image data and is studied using simulated data motivated by real datasets. The computational difficulties of the new approach are discussed, solutions for algorithm acceleration are proposed, and the written computer code is briefly analysed. Simulations shows that the quality of the estimated model parameters depends on the parametric distribution assumed for effects, but after fixing the model parameters to reasonable values, the distribution of the effects influences clustering very little. Simulations confirms that the clustering algorithm and the proposed variable selection method is reliable when the model assumptions are wrong. The new approach is compared with the popular Bayesian clustering alternative, MCLUST, fitted on the principal components using two loss functions in which our proposed approach is found to be more efficient in almost every situation

    Activation Adaptation in Neural Networks

    Full text link
    Many neural network architectures rely on the choice of the activation function for each hidden layer. Given the activation function, the neural network is trained over the bias and the weight parameters. The bias catches the center of the activation, and the weights capture the scale. Here we propose to train the network over a shape parameter as well. This view allows each neuron to tune its own activation function and adapt the neuron curvature towards a better prediction. This modification only adds one further equation to the back-propagation for each neuron. Re-formalizing activation functions as CDF generalizes the class of activation function extensively. We aimed at generalizing an extensive class of activation functions to study: i) skewness and ii) smoothness of activation functions. Here we introduce adaptive Gumbel activation function as a bridge between Gumbel and sigmoid. A similar approach is used to invent a smooth version of ReLU. Our comparison with common activation functions suggests different data representation especially in early neural network layers. This adaptation also provides prediction improvement
    • …
    corecore